For Reference 


NOT TO BE TAKEN FROM THIS ROOM 


ase 
Tye 
Roa 


ey 


Wisner! 


a AIBRIS 
NIOERSTTATIS 
in| 


> 


a Sy 
- 


=. 


7 
oe 
a 
— 


. 


THE UNIVERSITY OF ALBERTA 


RELEASE FORM 


NAME OF AUTHOR DE-LEI LEE 
LL PrEeeOmelHESs iS FAST ALGORITHMS FOR ASSOCIATIVE MEMORIES 
DEGREE FOR WHICH THESIS WAS PRESENTED MASTER OF SCIENCE 
YEAR THIS DEGREE GRANTED 1984 
Permission is hereby granted to THE UNIVERSITY OF 
ALBERTA LIBRARY to reproduce single copies of this 
thesis and to lend or sell such copies for private, 
scholarly or scientific research purposes only. 
Bnesauchorereservessother, publication rights, (and 
neither the thesis nor extensive extracts from it may 
be printed or otherwise reproduced without the author's 


written permission. 


Digitized by the Internet Archive 
in 2024 with funding from 
University of Alberta Library 


htips://archive.org/details/Lee1984 0 


THE UNIVERSITY OF ALBERTA 


FAST ALGORITHMS FOR ASSOCIATIVE MEMORIES 
by 


e), DE-LEI LEE 


A) THESES 
SUBMITTED TO THE FACULTY OF GRADUATE STUDIES AND RESEARCH 
IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE 


OF MASTER OF SCIENCE 


DEPARTMENT OF COMPUTING SCIENCE 


EDMONTON, ALBERTA 


SPRING, 1984 


= 


— 


. 
aoe T aT e 


THE UNIVERSITY OF ALBERTA 


FACULTY OF GRADUATE STUDIES AND RESEARCH 


The undersigned certify that they have read, and 
recommend to the Faculty of Graduate Studies and Research, 
for acceptance, a thesis entitled FAST ALGORITHMS FOR 
ASSOCIATIVE MEMORIES submitted by DE-LEI LEE in partial 
Poh OSA of the requirements for the degree of MASTER OF 


SCIENCE. 


gosaeian cua 28 rout apnea: 40 Yravaas 


bes .§ea2 aved vers jo4t ‘vaie}e8 Senbtexabew od? 1 
,dovasesd Ses 20ibude. sthotes, Bar) yrivost sft oF 
ROX aMHP LAGOA Tend | bo izitns atasfy 5 eer 
teidaeq . ni G82 = ash apenas ead LAOMaK avirersog@a. 
St eee ‘ee joan bones ada te coo e8tey 


Abstract 


This thesis is concerned with the design and 
implementation of efficient algorithms for associative 
memories. A general model of associative memories of m n-bit 
words is assumed, and the time complexity of any algorithm 
under this model is measured in gate delay units. 

A new threshold search algorithm with time complexity 
O(log n) is presented, as compared to O(n), the recent 
result of Ramamoorthy et al. [31]. Based on this algorithm, 
a class of search algorithms with the same time complexity 
is developed. The extremum search algorithm by Frie and 
Goldberg [5] is modified and generalized so that the number 
of memory interrogations is reduced by 30% over the initial 
algorithm in the average case. 

Another new algorithm is proposed for ordered 
retrieval, i.e., sorting. It retrieves k responders in order 
from the associative memory in time O(n+k) which compares 
favorably to O(k-log n), the best result by Lewin [7]. Based 
on the proposed ordered retrieval algorithm, a fast multiple 
response resolver is suggested which resolves k responders 
in time O(k+ log m). The suggested resolver is faster than 
the previous fastest resolver with time complexity 
O(k-log m) by Anderson [26] in most cases. Cellular logic 
implementations of these algorithms are discussed, and an 


analysis of the underlying hardware complexity given. 
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CHAPTER 1 
INTRODUCTION 


Currently, computers predominantly make use of two 
conceptually different types of memories, random access 
memory and associative memory. The most prevalent is random 
access memory which allows the word to be retrieved by 
address, 1.e., physical location in memory. Conventional 
computers achieve their generality from this 
location-addressable capability of random access memories; 
however, this generality makes the operations of searching 
and sorting overly time-consuming. If the random access 
memory is being used to store a list of N unordered records, 
where each record contains a fixed number of fields, it will 
take O(N) memory accesses to find a record with a specific 
value in a certain field. The operation can be reduced to 
O(log N) memory accesses by using binary search [16], 
provided the N records are sorted according to the field 
being searched. On the other hand, sorting N records stored 
in the random access memory is a much more costly operation 
for the theoretical lower bound for sorting is of O(log N!) 
memory accesses and comparisons [16]. Moreover, when all 
fields of the record are equally important with respect toa 
query, an inverted file system has to be employed in order 
to provide the means for binary search [25]. This, however, 


requires extra memory space and, increases system overhead. 
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The concept of associative memory or alternatively 
content-addressable memory, as originally introduced by 
Salde and McMahon [1] in 1956, lends a much better solution 
for the above problems. The distinguishing feature of such a 
memory is that it allows stored words to be retrieved by 
their contents, or part of their contents. The importance of 
associative memories lies not only in accessing data by 
content but also in doing so in parallel. 

As a result, a searching operation can be done in one 
associative memory access. This content-addressable 
capability consequentally reduces somewhat the need for 
sorting, particularly in the case where sorting is used to 
provide the means for binary search. It also eliminates the 
requirement of extra memory space for index files because 
Geatchingucanenow be dene equally well for each record 
field. Accordingly, system overhead is also minimized [25]. 

Sorting is now required only for sorting output lists. 
However, some efficient techniques have been discovered that 
take only O(N) accesses of the associative memory to output, 
in order, a stored list of N records. 

Because of the content-addressable characteristic, and 
efficient parallel processing capability, associative memory 
has found its application in various fields, such as 
sorting, relational data searches, pattern recognition, 
machine translation, question-answering systems, job 


scheduling and many others [21],[23],[27]. 
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The design and implementation of efficient fundamental 
algorithms for associative memories, the subject of this 
thesis, is the heart of different applications and as such, 
has been a focus of research in the area of associative 
processing from the time the field was established. 
Furthermore, VLSI technology with its promise for the near 
future in conjunction with new associative memory 
architectures make the SCHORR realization of efficient 
hardware algorithms for associative memories worth 
investigating. 

This thesis begins with a description of a general 
organization and some background in which associative 
memories operate, and proceeds, in Chapter 2, to define the 
fundamental searches in the context of an associative memory 
Staien=DiLewords |. 

A new scheme for constructing search algorithms for 
parallel associative memories is described in Chapter 3. The 
resulting equivalence searches, threshold searches and 
double-limit searches achieve the time bound of O(log n) 
compared to O(n), the recent result of Ramamoorthy [31]. The 
extremum search algorithm by Frie and Goldberg [5] is 
modified and generalized. It is shown that the modified 
algorithm reduces the number of memory accesses by 30% in 
the average case. 

Chapter 4 addresses the very important, classical 
problem imposed by retrieval of an ordered list from 


associative memories. After briefly reviewing previous 
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solutions to the problem, an efficient new ordered retrieval 
algorithm is proposed along with a cellular logic 
implementation. The algorithm is proved to be of time 
complexity O(nt+k) compared to O(k:log n), the previous 
fastest method of Lewin[7], where k is the number of 
responders to be retrieved in order. 

Chapter 5 deals with the problem of resolving multiple 
responses in associative memories. The fastest multiple 
response resolver by Anderson [26] is examined. A new 
resolver is suggested. An analysis of the efficiency of 
these two resolvers is presented. The proposed resolver is 


Superior to itsS competitor in most cases. 
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CHAPTER 2 
BACKGROUND 


This chapter establishes the fundamentals of 
associative memories which will be used in subsequent 
chapters: the concept and typical functional components of 
general associative memories, the characteristics of two 
different types of associative memory architectures, anda 


formal definition of various associative memory searches. 


2.1 The organization of associative memories 

The retrieval of information from a memory requires 
that a particular word of the memory be identified as 
containing the required information. When identification is 
done, the selected word can be read out and/or processed as 
required. According to the way the identification is 
achieved, computer memories fall into two different 
categories. 

In the first category, the conventional random access 
memory accomplishes this identification by a given 
specification of the desired word's physical location or 
address in memory. Associative memory, in the second 
category, accomplishes this identification by specifying 
partial information of the desired word. 

This difference provides associative memory with a 


variety of powerful memory operations other than the READ 
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and WRITE provided by random access memory. This extensive 


class of operations, to be precisely defined 
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Fig. 1. Associative memory model. 
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in the next section, has made associative memory very 
attractive for a wide range of applications. 

An abstract associative memory model is schematically 
shown in Fig. 1. 

The memory unit iS a two-dimensional, iterative 
configuration of m*n identical cells for storing m words of 
pebirsPeach udenoted (as Bi1) OforBalim ieee. emi teTnetvalue 
of them jsthebitsofiB(i)  BOryg) \as@storedhintthetcel1s(475) 
at the i-th row and the j-th column, where j=1 is the most 
Si gnimecante bite 

The memory unit is connected to five registers: I, C, 
M, s and r. Register I provides the means for buffering the 
information to be transfered between the host computing 
system and the associative memory. Registers C and M serve 
the purpose of specifying the contents or partial contents 
Gttardesiared-wordgaiIn particularyeCcoholds antargumentroft n 
bits called the search key word against which B(i), 
ie{1,...,m}, will be compaired in parallel; M has the 
ability to mask some portions of the search key word in C. 
In other words, any bit of C can be masked as a ‘don’t care' 
state as desired, and only those unmasked bits of C specify 
a search criteria. Let C(j) be the value of the j-th bit of 
C and M(j) the value of the j-th bit of M. M(j)=1 means C(j) 
is unmasked in searches, while M(j)=0 means C(j) is masked 
out as the ‘don't care' condition. The ‘don't care' state 


will be denoted throughout this thesis as ¢. 
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Register s is used to select a set of words to be 
involved in searches. s(i) denotes the value of the i-th bit 
of s withisGi)=leindicating thateB(i)?liseinethe set. 
Register r, on the other hand, is used to store a search 
result, where r(i) represents the value of the i-th bit of 
r. At the end of a search, r(i)=1 indicates that the 
corresponding word B(i) satisfies the search criterion. 

In some associative operations, at the end of a search, 
a decision regarding the next action is based on the 
knowledge of the presence or absence of any responders, 
i.e., words responding to the search. The functional 
component, denoted as D, is included in Fig. 1 to provide 
this knowledge. 

The component MRR, Multiple Response Resolver, deals 
with operations such as read and write. A search of the 
memory may yield more than one responder. The multiple 
response resolution arises when the associative memory 
outputs these responders one at a time. To select a 
responding word to read out, the MRR generates an m-bit 
vector of which there is exactly one element with a value 1 
corresponding to the selected word. When this vector is 
generated, the selected word can then be read out without 
ambiguitye Bvidently}ethesgtotalmtimesrequiredytotread out 
all responders depends greatly on how fast the vector can be 
generated to identify a word each time so that a read 
operation can take place. MRR techniques will be discussed 


SurthereineChape ned. 
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Finally, the control unit coordinates the above 
functional components. 

To support the content-addressable concept and the 
parallel processing capability, each word B(i) in the memory 
must have its own hardware comparison logic so that the 
contents of each B(i) can be compared with the value of the 
search key word specified by C and M simultaneously. It is 
now easy to see that associative memory belongs to the class 
of Single Instruction Stream-Multiple Data Stream (SIMD) 
machines, from the architecture viewpoint [27]. The 
comparison logic can be viewed as the processing elements of 
the SIMD machine, each of them either executes or ignores an 
mStriuctronserrom the-controlomnit on its own’ data, the 
associated word. 

According to the way the comparison logic is provided 
to the words, which in turn determines the way searches are 
performed, associative memories can be further classified 
into the following two major categories: 

1) Bit-parallel associative memories; and 

2) Bit-serial associative memories. 

In a bit-parallel associative memory, one bit of 
comparison logic is provided for each cell of the memory 
Onsite. eineassearch,othet processing logictofeceld (1,5) 
compares the contents of B(i,j) with one interrogation bit, 
the unmasked C(j). This is done simultaneously for every 
celimeandetheefinalkdecisionsasstoowhethersormnot ther word 


B(i) satisfies the given search criteria is made by all 
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cells in the i-th row. Assuming each cell in the memory unit 
in Fig. 1 is equipped with comparison logic, then it can be 
identified as a bit-parallel associative memory. 

The bit-serial associative memory is based on the 
concept of parallel processing with vertical data introduced 
by Shooman [2]. Here, only one bit of comparison logic needs 
to be provided for each word B(i), and exists outside the 
memory unit. Assuming that none of the cells in the memory 
unit of Fig. 1 contains comparison logic, and one bit of 
hardware comparison logic is added together with s(i) and 
r(i) to form the processing element for each B(i) to gain 
the content-addressable capability, then the associative 
memory becomes a bit-serial one. 

A bit-serial associative memory must be capable of 
reading all bits in any bit-sliceé into their corresponding 
processing elements in parallel. A bit-slice is made up of 
one bit of every word in the memory. The control unit is 
responsible for reading the unmasked bit-slices one at a 
time and broadcasting their corresponding portion of C one 
bit at a time into the processing elements in a synchronized 
manner. The storage element r(i) in processing element 1 1s 
now used to remember the intermediate comparison state from 
onesinterrogating™ bit» towthernextsin®the® course*otfatsearch. 

The bit-parallel associative memory has the advantage 
of a simpler control structure and much faster search speed 
at the expense of the additional logic built into every 


cell. In contrast, a bit-serial associative memory has a 
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much slower search speed and a more sophisticated control 
structure, but considerably less hardware. The low search 
speed of the bit-serial associative memory is essentially 
imposed by the necessity of reading all unmasked bit-slices 
out of the memory unit and processing them outside the 
memory unit sequentially. It is, in fact, a compromise 
between bit-parallel associative memories and random access 
memories. The search time is measured in read cycles for the 
bit-serial associative memory and in gate delays for the 
bit-parallel one. 

The most commonly used and widely discussed searches in 


the literature include the following: 


1. equality search 2. inequality search 

3. greater-than search 4, not-greater-than search 
5. less-than search 6. not-less-than search 

7. between-limits search 8. outside-limits search 
9. maximum search 10. minimum search 

11. nearest-above search 12. nearest-below search 


There are other nonsearch operations that can be 
performed in associative memories. These include field 
addition [27], «summation and product :[29], counting [21], 
etc. Mechanisms that incorporate nonSearch operations are 
referred to as associative processors [28]. Nonsearch 
Operations ‘will not be further investigated in this thesis. 

Before further discussing and analyzing some efficient 
associative memory algorithms, the searches summarized above 


will be more precisely defined in the next section. 
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2.2 Definition of searches 
Definitions and notation for the searches to be 
examined fully in Chapter 3 will now be developed in the 
context of the associative memory model depicted in Fig. 1. 
Let S and Z be two sets defined as follows: 
S,=1B(idolshaviaiy, 1Si<m}y andtze=igM(qd=ited<gen} . 
According to the magnitude of the unmasked bits of the 
search key word held in C, S can be partitioned into three 


disjoint subsets denoted by L, E, G respectively: 


Leet Bae re? WECB (ij )-C(5))<0,. Bi vest; 
jeZ 


Ev] (Bit) he?" t UB eies )=C( 5) )=0, Br) eS), 
jeZ 


GPa) Bini) (AUe PRB its Ee Cia) ) 0; BC) eS). 
jeZ 
The searches can now be defined in terms of S, L, E and 


G as follows: 


A. Equivalence searches: 
1) equality search that produces E, 


2) inequality Search = thatjproducesysS 59k. 


B. Threshold searches: 
1) greater-than search that produces G, 


2) not-less-than search that produces GUE, 
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3) less-than search that produces L, 


4) not-greater-than search that produces LUE. 


C. Double-limit searches: 
For two given limits x and y, where x is less than jy, 
let Lx, Ex, Gx and Ly, Ey, Gy be the sets defined as L, E 
and G with x and y respectively held in C. All double-limit 
searches can be defined in terms of these sets: 
1) between-limit searches that produce 
a) Gx Ly, 
b) (Gx UEx) MLy, 
c) Gx A(Ly vEy), 
d) (Gx VEx) n(Lyv Ey); 
2) outside-limit searches that produce 
e) S - (Gx VEx) n(Ly UEy), 
f) S - Gx A (Ly UVEy), 


(Gx v Ex) aLy, 


wa 
~— 
Wn 
I 


hjeeSe = (Gx Ly), 


D. Extremum searches: 
1) maximum search that produces the set 
GB iy) Leone AB Gi jn 2B UR ae = Oye Vike be andeBiys BK) eS) 
jeZ 
2) minimum search that produces the set 
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E. Adjacency searches: 
1) nearest-above search that produces the set 
(BC) eee BAe JOCBKH ore Vike and) B(1)>B(k) eG)" 
jez 
2) nearest-below search that produces the set 


{B(i)| 2 2"-5(B(i,j)-B(k,j))>0, vVk#4i and B(i),B(k)eL}. 
je2 
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CHAPTER 3 
FAST SEARCH ALGORITHMS 


Prevous search algorithms are discussed in this 
chapter. A new algorithm for threshold search is proposed. 
The algorithm is of time complexity O(log n), as compared to 
the time complexity O(n) of the previous fastest algorithm 
[31]. Based on this algorithm, extremum search is achieved 
by a sequence of 0.7n memory interrogations on the average, 
compared to exactly n interrogations conventionally [5]. 
Implementations of double-limit searches and adjacency 
searches are sketched, and an analysis of the underlying 


hardware complexity is also presented’. 


3.1 Previous search algorithms 

Search algorithms have been intensively studied in the 
past. Some important papers discussing general algorithms 
include Gauss [4], Frie and Goldberg [5], Falkoff [8], 
Estrin and Fuller [9], Wolinsky [13], Feng and Lee [17], 
Ramamoorthy [31], and others [24],[28]. 

Early bit-parallel associative memories had only 
equality search in hardware as the basic search, while the 
other searches were achieved by executing a corresponding 


sequence of the basic searches. 


* Significant parts of this chapter are to appear in IEEE 
Trans. Comput. and Proc. the 1st International Conf. on 
Computers and Applications, 1984 [32],[33]. 
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Perhaps the earliest proposal for extremum searches is 
due to Frie and Goldberg [5]. The maximum (or minimum) 
search was carried out by a sequence of exactly n basic 
searches. Also introduced in [5] was an interesting 
procedure for adjacency searches which needs, at best, one 
basic search and, at worst, 2n-1 basic searches. Algorithms 
for threshold and double-limit searches were also proposed 
and shown to be basic search BeOTEnCEGHGE length n and 2n 
respectively [8],[9]. 

The major advantage of such a scheme is the low 
hardware cost in building bit-parallel associative memories. 
As shown in [28], the basic cell requires only three gates 
for the comparison logic. It is a disadvantage, however, 
that search speed is uncomfortably slow for those composed 
searches, sixteen search operations out of eighteen. 

A time versus space trade-off exists, to some extent, 
in the design of fast search algorithms for associative 
memories. Having a more powerful basic cell, in terms of 
comparison capability, the majority of search operations can 
be done in O(n) gate delays instead of O(n) basic searches. 
As a matter of fact, a hardware algorithm for a threshold 
search has been recently designed and shown to operate inn 
gate delays by Ramamoorthy [31]. All double-limit searches 
can then be done in 2n gate delays by performing the 
threshold search twice. Therefore, fourteen search 
Sperations@out of eighteen can bev achieved in only n to 2n 


gate delays. More interestingly, using the threshold search 
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as a basic operation provides the possibility to improve the 
speed of Frie and Goldberg's method for both extremum and 
adjacency searches. A new algorithm to be presented in 
Section 3.2 indeed achieves such an improvement. As reported 
Helo lipthesbastcacell for the thresholdssearch contains 
Six gates for the comparison logic which doubles the amount 
of comparison logic of the cell in [28]. However, for a 
minor hardware investment, the gain is worthwhile. 

The need for an extremely fast threshold search 
algorithm is quite obvious due to its tremendous impact on 
the overall performance of other composed searches. Before 
presenting a much faster new algorithm for threshold search, 
Ramamoorthy's algorithm is first examined. 

The associative memory model in Fig. 1 and the related 
notation will be used to describe Ramamoorthy's algorithm. 
In addition, the symbol E(i,j) denotes the output aonernt ec 
Byecel lm i eg-—ewhich 1sabeingstaken by celdi.(i, ja aS anputs. 


The algorithm follows: 


Begin 
1 C:=search key word; 
2 Migs svc. ees 
3 r(i):=0, Vie{1,...,m}; 
4 Bion t= Si) me WiLen vletteny eas 
5 For. 42s) Until nsDo 
6 Begin 
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9 G(i):= d(i,k); (wired-or) 
end; ine 
end. 


Upon termination of the algorithm, E(i,n+1) together 
with G(i) indicate the membership of word B(i) as follows: 
1). G(i)=1 3 B(i)eG; 
2) Cente B Cre Es 


3) t-E Gen HH) <@er) == BCi)eL? 


The r(i) can be properly set to store the result of a 
search as required. 

The logical realization of the cell to implement steps 
Peovand. 9eOrethetalgorrenm is shown in Fig. 2. The delay 


from E(i,j) to E(i,j+1) is one gate delay. 
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The following example shows a threshold search of four 
words, each seven bits long, where the symbol 


G 
a represents the state of G(i) and E(i,j+1) enabled by 
1) 


cel lexan) Batrthetendgoflthesyeth iteration of the 
algorithm. 


Example: Threshold search operation 


search key word: Cent tO} aia OLe le aie 
bit mask pattern: Mer theveioved t40, 
effective search key word: MOe 1 Oe eles 
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words in the memory: B(1) 
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For the result of the search, see the interpretation 
following the algorithm. 
This algorithm has two interesting properties: 
1) full parallelism - each cell (i,j) is capable of 
comparing B(i,j) with the unmasked C(j) to produce local 
results simultaneously; and 
2) full serialism - the final comparison result has to 
be carried out by scanning these local results serially, 
mosta significant) bitsfirste 
Ramamoorthy's algorithm is, therefore, bit sequential 
processing in nature; the time complexity O(n) is the best 
phage itycan=do. 
In the next section, a new threshold search algorithm 
is presented which utilizes parallelism in a more efficient 


Way mcescultungein a much fastervalgorithm. 


3.2 A new threshold search algorithm 

In presenting the new algorithm, the associative memory 
model in Fig. 1 and the related notation will be used. 
Additional notation will be introduced as required. The 
memory word organization is based on the word-tree concept 
which will be explained fully. The definitions and 


terminology for trees and their traversal to be used follow 
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3.2.1 The word-tree concept 

Let T(i) be a complete binary tree with n nodes from 
the n cells at the i-th row of the memory unit. 

The mapping of cell (i,j) onto the nodes of T(i) is 
determined by the following two rules: 

Rule 1: endorder numbering the nodes of T(i); and 
Rule 2:) mappingice ll 9¢i $j WiontorTGr)7) = where T(i,j) 1s 
the node of T(i) whose number in endorder is j. 

As a result, B(i) is stored in a hardware binary tree 
T(i) termed the word-tree. 

Definition 3 .lmewhcin; ) a ethegvwetqntwot a node Ti, ))), 
bic ti PAEIA 

Definition 3.2. WTR(i,j), the weight of the right 
subtree TR of a node T(i,j), is the sum of the weights of 
the nodes in TR. 

As an immediate consequence, the following lemma 
describes an interesting property of the word-tree, and 
strongly suggests a fast threshold algorithm given in the 
next section. 

Lemma 3.1: For any node T(i,k) in the left subtree TL, 
aad *6 (i ,] pointthe e@eightiisubtree TR¥of T.(i 75), ithen 

1 VEWT Giles) = aWTR Gis) ChawnGL, JL. and 
2) AWLGAN) E> BWI DM 

Proof: From Rule 1, the number assigned to any node in 
TL is less than the numbers assigned to those in TR. 
Furthermore, any number assigned to the nodes in TR is less 


than the one assigned to node j. Without loss of generality, 
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assuming that TL and TR both have s nodes, then (1) the 
maximum number assigned to a node in TL is dts, (2) the 
numbers assigned to nodes in TR are: dtst+1, dtst+2,...,d+2s, 
(3) the number assigned to node j is d+2s+1 for some integer 


eh WESSES Sat, 


Thus, 
St] 
WER (yest «WT Gi 2S 4h) eee ee Snes 
t=1 
since, 
Stl 
x OUst5 sae = Dn = Cars Owe ge 4. ea, 
t= 1 
therefore, 
s+] 


2b 
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The right hand side of the inequality is just the 
smallest weight of a node in TL; and WT(i,k) is not less 
than this weight. The second part of the lemma can be proved 


VieeanSiMilean manner. 


3.2.2 The algorithm 

Without loss of generality, let n equal 2*-1, and 
de{1,i..,k}s Define Q(d)={j|the level’of T(i,j) ised, 
Wajsn} 2 Invthesalgorithme thatetoldows, E(i, 3-25") and 
G(i,j-2*~*) represent the output generated by cell 
Ci j-25ee) awitch, @S*tnHeslerteson@coiecellie, 3) vot level d, 
and E(i,j-1) and G(i,j-1) the output generated by cell 
(ij aiewhich i1setherrightsonmotecell (i 3) .e80th=outputs 
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are the output generated by cell (i,j) to its father. The 


algorithm is now presented as Algorithm 3.1. 


Algorithm 3.1. Threshold search. 


Begin 
B: C:=search key word; 
ae MGj hse), ave Ligns. amb: 
3. Yr Gihs=Om Viet late pm be 


4, B Gi jk: 32 OM G5 EB (7 2OC Oy) BES 20 ( ks 
5. Gia ais=M (eB Oi $7): 20 Cl te ¥VjeQ(k); 


Forgd g=k>1 Step ioipUntii ht Bo 


Begin 
6. EGi, 1 ht=E (iy -29 Ges crys hoa Gas h7).0C(4)).), 
WqeEO (a): 
its G Ginea).=G Oil, p=2 SNe) BEE CHE 2 Secna(C Gi Ri jt 
EGigj at) <M Gy eB. Ci Ap) ae Kays ¥jeQ(d); 
End; 
End. 


At completion of the algorithm, E(i,n), G(i,n) together 
with s(i) indicate the membership of B(i) as follows: 
1 PareGi, mos CG?) he 8B CijjeE? 
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Theorem 3.1: Algorithm 3.1 performs a threshold search 
OnimiwOrdseofe (Zea i )mbits,eacheine timesO(k)e 

PPOOR mB ye nOucC tT ONMONa elem DaSlTS mkK=02°8 1Setrivial. 
Now assume the inductive hypothesis is true for k=s. 
Consider the word tree T(i) with 2**'-1 nodes representing 
Be) eO fe CZemecmepntS 6 1.Cil heme Gi) econsiStsmOraanrtoot 
r having left and right subtrees, TL and TR, with 2*-1 nodes 
each. Invoking the inductive hypothesis and observing step 6 
and 7 of the algorithm (which is based on Lemma 3.1), the 
algorithm works correctly for k=s+1. 

One pass of the loop of the algorithm requires constant 
time (steps 6 and 7) to compute E(i,j) and G(i,j). The loop 
is repeated k-1 times to give time complexity O(k). 

OLB. De 

Implementation of the algorithm proceeds as follows. 
The processing logic of the cell is designed to accomplish 
the computation of step 6 and 7 of the algorithm. Fig. 3(a) 
shows a cell module used to implement the node of the 
word-tree. Each cell (i,j) has two incoming lines, 

Giipga coe eandeEh (ij o2 eo et rommtsuterteson, wtwouincoming 
VineseeGuirj— 1) and E(i,, jatjpeot OmedtSerignceson, and CWO 
Outgoing lines, G(i,j) and EXigjjeto its; father. 

A logical realization of the cell is shown in Fig. 3 
(b). Accordingly, steps 6 and 7 of the algorithm can be 
completed in a period of exactly two gate delays. The amount 
of comparison logic of the cell is seven gates, with maximum 


fan-in and fan-out of three. 
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Fig. 3. Searching logic. 
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Using the cell module as an elementary unit, the memory 
unit in Fig. 1 can be constructed by m+n identical cells 
with a uniform interconnection. 

It should be pointed out that preorder or postorder 
numbering of the nodes of T(i), together with an appropriate 
modification of steps 6 and 7 of the algorithm can serve the 
fast search purpose equally well. 

The following example demonstrates the two-bit outputs 
G(i,j) and ECE jJmohicelle(ivaj Je ttorvarthresholdysearch of 
four words of seven bits each. Each word is stored in its 
corresponding word-tree. In this example, de{1,2,3}, 
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The symbol |E represents the state of G(i,j) and E(i,j) 
zj | 


enabled by cell (i,j) of level d at the end of the (3-d)-th 
iteration of Algorithm 3.1. For the result of the search, 


see the interpretation following Algorithm 3.1. 


Example: Threshold search operation. 
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The search is accomplished in 3 computation steps, 
while Ramamoorthy's algorithm requires 7 steps to do the 
Same job. As seen from this example, a high degree of 
parallelism has been exploited. 

An alternative approach to implementing Algorithm 3.1 
makes use of two different kinds of cells for nonterminal 
nodes and terminal nodes of T(i) respectively. The fact that 
the terminal nodes have neither left son nor right son 
allows the removal of gates 4-7 from the original cell. This 
results in another type of cell with only three gates for 
the comparison logic which will lie in the position of 
terminal nodes. This will reduce the hardware complexity. 
Each cell now requires, on the average, no more than five 
gates for the processing logic, because the number of 
terminal nodes is never less than the number of nonterminal 


nodes. 
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A natural generalization of the word organization in 
the form of a complete binary tree is in the form of a 
complete k-ary tree. In the complete k-ary tree, each node 
has exactly k sons except for those on the two bottom 
levels, and each level except the last is completely filled 
with nonempty nodes. On the last level, some number of 
rightmost nodes are allowed to be empty. 

Mapping a memory word onto the complete ea tree can 
be done in essentially the same way as the complete binary 
Enee. 

Definition 3.3. WT(i,j,S), the weight of the s-th left 
subtree of a nonterminal node T(i,j), is the sum of the 
weights of all nodes in the subtree. 

An argument similar to the proof of Lemma 3.1 can show 
a useful property of the k-ary word-tree below: 

For any node T(i,j,s) in the s-th left subtree, 1<s<k, 
Ofea nonterminalpnode Tity5) sof oT(i) \then 

k 
WT Gi 25 9S) => Sew Ge, BW 3). 
l=st1 

With the property above a cell module can be designed 
so that k-ary word-tree can be constructed by n identical 
cells with uniform interconnections. Each cell has two 
outgoing lines to its father, and 2k incoming lines from its 
sons. The function of the cell can be realized by two levels 
of logic gates. The resulting cell needs (k+5) gates with a 


maximum fan-in and fan-out of k+1. 
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It can be easily seen that the search speed increases 
logarithmically with k, while the hardware complexity of the 
cell is linear with k. However, the analysis of using two 
different kinds of cells to reduce the hardware cost is 
relevant here. In what follows, a formula is derived for the 
hardware complexity when two types of cells are used to 
implement the k-ary word-tree. 

Suppose the k-ary word-tree has n nodes of which there 
are 1' nonterminal nodes and i terminal nodes, the cell with 
(k+5) gates is used for the nonterminal node, and the cell 
with 3 gates for the terminal node. The total number of 
gates required for the word-tree is (k+5)-i' + 3i; hence, 
the average cost per cell is 

EGkto rainy asm) ie (1) 

By the definition of complete k-ary tree, there exists 
at most one nonterminal node which has fewer than k sons. 
When this nonterminal node has exactly one son, the 
relationship between i' and 1 can be easily established as 

re “ake Choire sear (2y) 

Subst ltubing i= pein jbackwinto Gear, andi—solving Geefin 
terms of n and k gives 

Veen TK Zeke MGs 
which implies that (n-2) mod k = 0 if and only if there 
exists one nonterminal node which has exactly one son. 

However, when (n-2) mod k # 0, an argument to follow 
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i? =nob(k+n=2)7k4 , (4) 

where ‘x denotes the largest integer less than or equal to 
X. 

bee (n=2) “mod@ke=80) 2 thenl theres mustilexi stem snthe 
jargest fnteqersbelonging ttold? (83....,n-1} such that 
(ni >2) modi ke={0WeAuk-ary tree of ni enodes will have 
(k+n'-2)/k nonterminal nodes of which one, say 6, has 
exactly one terminal node, and n-n'<k. This will guarantee 
that developing a k-ary tree of n nodes from that of n' 
nodes can be done by connecting the (n-n') nodes to 8 as 
additional terminal nodes, since the number of sons of 8 
does not exceed k. Therefore, (4) is established. 

Substitution of the previous result back into (1) 
yields the following final form for the average cost per 
Comer 36> [(kKeZ)e i Ho(na2y7 koja nk 

In particular, for n=64, k=8, the average cost per cell 
is 4.25 gates, and the search speed is 6 gate delays. 

In conclusion, the use of two types of cells to 
implement the k-ary word-tree guarantees that search speed 
increases logarithmically, and hardware complexity decreases 


slightly as k increases. 


3.2.3 Implementation of double-limit searches 

Given the above threshold search, all double-limit 
searches defined in Chapter 2 can be accomplished by 
executing the proposed algorithm twice in succession. As an 


example, in order to obtain the search b) of between-limit 


ods. *n 22ixe, sauM aradd | 

gado dova (frites. 1&5 oy : 

svan Lfiw 2abon ‘nm Ic go72 td, 42 402 vou 

esd ,) yse . one, dordw sobeaebe Isnimseznon ssa" 

sotnntany Liiiw etait <a n-a bas /shon Llactere? ono Yee 

‘np. 10 Fad3no023 asbop 7 36. aes Yie= S piidoravab 

#69 07 286en (‘real ais paien -9Ano vd \.shob. ad nea 

j.. ta-ende, to redaun. ods soaks 7 redhion Lemimas? isnot 

bade iideses adi (3) (Iee8 BAT a basoxs ton i 30 

tr) offs anand sigese sub iva ye, ag2 io fortes: “satus on 

{sq 2209, spsaiavB SH 107° mro7, ‘ens pRiwollé? erg | 

re at Lassner (ey > £8 aaa 

liso mq 2605, apex 2 8Ne, ou7 ,S=A- Agar 20h, 1aiu>hj2ea. al 
‘avetet S72 Be e. 42 Saaie. HoGkar |: 943 nis (arate 

57 aliss In esqy> ows to seu off \nodautones ay : 

Seoge fotkse send esedaszsueE aa¥e fas vg oS aa eeta tam 

esassi59b ¥s ceatbitad stebhted | Bia! Abteatabitohs bed jek 


-asassioni Y26 ¥ 


ws 2” 
4 


~—_ 
- ; m “y, on - ope ni 
- fee Pa , wii i 


> _ periments 


32 


searches, the not-less-than search is performed with C=x, 
followed by the less-than search with C=y on the resulting 
responders. Other double-limit searches can be obtained in 
the same way. Consequently, all double-limit searches can 


also be performed in time O(log n). 


3.3 A new extremum search algorithm 

The fast speed of the proposed threshold search 
guarantees that extremum searches can be achieved by a 
sequence of threshold searches without introducing undue 
time penalities. Frie and Goldberg's method for extremum 
search needs to perform exactly n equality searches [5]. A 
new extremum search algorithm to be described makes use of a 
threshold search sequence instead. It will be shown that the 
number of threshold searches can be expected to be 0.7n. 

To begin with, a maximum search algorithm is presented 
ase Algorithm 3.2 (to) nllivstrare ghowsthe search cangbe 


achieved by a sequence of threshold searches. 
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Algorithm 3.2. Maximum search. 


Begin 
1 M:=0; j:=1; 
2 While j<n Do 
Begin 

3 Gee r=11 ec Cpt) seOeriM Gai: => £M G1 as = 7 = 
4 Threshold-search; 

m 
5 Soe AER (wired-OR); 

m 
6 EE CMS EE, (wired-OR); 
7 LEPhh=itebTheniG esis 
8 Else If g=0 

Then 

Begin 
9 Gi i=0- 
10 Threshold-search; 

m 

11 Spey ers SAE, (wired-OR); 
2 lietaiethens'C(5 + 1): =i 

End; 
13 782342; 

End; 


End. 
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The action of this algorithm can be described briefly 
as follows: step 1 initializes M and j. During the first 
iteration, after step 3 is executed, M has 1's in only the 
leftmost two bits starting from the first bit position. The 
corresponding two bits of C contains 10. Step 4 splits the 
set of candidate words into the three subsets shown below, 
according to the search pattern specified by C and M, 

119Ob69OO9bO9b94O00 

10DGObObOO9OO9OO9 

ODDO D>OOOOOOOOOOOS. 
Stepsvechecks iti the Wirstk subset) is emptyeorenot.8 If htiais 
not, the word with the maximum value must have 11 in the 
leftmost two-bits. Therefore C(j+1) is updated to 1. 
However, if both the first and second subsets are empty 
(tested in step 8), then the word with maximum value has to 
be in the third subset. Accordingly, C(j) is reset to 0 and 
an extra separation, step 10, is necessary to determine 
which of the following two subsets contains the word with 
maximum value, 

01 Pd GODS HdOOdOODOD 

OOD DOPOD OOOOHODOOOS. 
Likewise, if the current first subset is not empty (tested 
instep 12), C(j+1) is*updated to 1. Step 13 increases jeby 
Pema ndesetsceM( |) ,eM(jtuvatOmiip eC) 3 Cl371 jatomLG, 

In the succeeding iterations, M and C will specify a 
searching pattern which agrees up to the current, (j-1)-th 


bit with the word having the maximum value and has 10 in the 
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next two-bit string (the j-th and (j+1)-th bits). The whole 
process is then repeated, and the loop executed n/2 times. 
Each time, however, the number of searches is not fixed, but 
dépendsmomethegsdistgibution of Rid Ael0AeGl, and 00 in each 
two-bit string of the maximum value. Assuming the 
distribution is even, then the average number of searches 
BONSCACHGMGeraAt TOM ENS: B(Oh25 inde Ons oi et ORS ie? Foe. 
Upon termination of the algorithm, C(1) through C(n) will 
hold a copy of the maximum value. 

The above algorithm can be generalized so that the 
register M is filled with k 1's at a time to obtain a family 
of maximum search algorithms. In order to evaluate these 
algorithms and select the one of which the number of 
threshold searches required is minimal, the assumption here 
fsaohatlthenbitepattesns 000. OF .00mee | GGtanerd (mo eitace 
evenly distributed in each k-bit string of the maximum 
value. Let I(k) be the number of searches in the average 


case, for each loop of the generalized algorithm, then 


k-2 
ThGk i= Zanes [trad fyi | ae CK ee Chama ee ie 2 eae ol) mete KG 
fae 
k-2 k-2 
SP) hes dead k eka Ike 
We) jes 
Skeet oC kh 2) ee 


Therefore, to implement the maximum search requires a 
Sequence of 1 (k)-en/kethresholdusearches @i(k)/k i1smthesratio 


of the number of searches required by the proposed algorithm 
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and that by Frie and Goldberg's algorithm [5]. When k=3, the 
ratio is 0.7 which proves to be the most efficient choice of 
all. The number of memory interrogations can thus be reduced 
approximately by 30% in the average case. 

The minimum search can be done, on the same basis, with 
the same time complexity. Adjancency searches can be 
achieved by following a threshold search with an extremum 


search. 


3.4 Conclusion 

The search algorithms proposed here are quite 
efficient. The threshold searches and double-limit searches 
have significantly improved the time complexity over 
searches of the same type in the literature, from O(n) to 
O(log n). An improvement over Frie and Goldberg's extremum 
search algorithm has been achieved. Implementations of the 


proposed algorithms have been suggested. 
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CHAPTER 4 
EFFICIENT ORDERED RETRIEVAL ALGORITHMS 


The problem of retrieving an ordered list of k words 
from associative memories of m n-bit words is addressed in 
this chapter. An efficient new algorithm is proposed along 
with a cellular logic implementation. The algorithm is of 
time complexity O(ntk). By contrast, the fast algorithms by 
Lewin [7] and Ramamoorthy [31] are of time complexities 


O(k-log n) and O(k-n), respectively. 


4.1 Previous ordered retrieval algorithms 

The ordered retrieval of a set of words from an 
associative memory has long been an interesting research 
probiieme bed pholerélpovde ll d2) Bandsl3t) teThetiarst proposal is 
due to Frie and Goldberg [5]. Independently, Seeber and 
Lindquist [6] postulated another scheme which utilized a 
more complicated reporting mechanism to provide the 
knowledge regarding whether or not there exists exactly one 
responder to a previous search. The method in [6] requires, 
in most cases, fewer memory accesses than that of [5] to 
retrieve, in order, the same number of responders from an 
associative memories. However, the inherent characteristic 
of these two algorithms is that the number of memory 


accesses is dependent on n [7],[11]. 
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The use of an associative memory with a Bit-Slice Value 
Indicator (BSVI) was first introduced by Lewin [7], and has 
achieved the most efficient ordered retrieval algorithm thus 
far proposed. The BSVI is a hardware mechanism which has the 
ability to determine, for each bit-slice, whether all words 
of interest store the same bit in this bit-slice or not. In 
other words, BSVI is capable of distinguishing a mixture of 
1‘s and 0's from either only 1's or only 0's stored in that 
bit-slice. As proven by the inventer, the number of memory 
accesses necesSary to retrieve k responders in order from an 
associative memory of m n-bit words is exactly 2k-1, 
independent on n. A simple proof of Lewin's ordered 
retrieval theorem can be found elsewhere [13]. 

Some efforts have been made in the past to further 
reduce the number of memory accesses of Lewin's algorithm by 
introducing a more powerful BSVI. As reported by Miller 
[12], when the BSVI is also able to determine, for each 
bit-slice, whether or not there exists exactly one distinct 
bit stored in the bit-slice, the number of memory accesses 
can be reduced to, at best, k+1 and, at worst, 2k-1. 
Unfortunately, Miller's method does not necessarily retrieve 
the k responders in an ascending or descending order; it can 
only be used for multiple response resolution rather than 
sorting. 

Further utilizing the powerfulness of the BSVI, 
Ramamoorthy has designed the fastest algorithms for extremum 


searches [31]. Of particular interest, the maximum and 
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minimum searches can be performed on the memory 
concurrently. As a consequence, k responders can be isolated 
in k/2 memory accesses. However, each memory access of 
Ramamoorthy's algorithm carries out extremum searches 
requiring O(n) gate delays [31], while that of Lewin's only 
performs an equality search. The essential dominant part of 
Lewin stalgqorithm ys the time required ‘tovlocate the 
leftmost bit-slice with a mixture of 1's and 0's stored by 
all selected words each time before the equality search can 
take place. This locating process is equivalent to the 
problem of selecting the first logic 1 in a binary vector of 
length n, and takes time O(log n) to complete, even if the 
best algorithms [14],[26] are assumed. 

Measured in terms of gate delays, the time complexities 
of Lewin's: algorithm and Ramamoorthy's algorithm are 
O(k+-log n) and O(k-n), respectively. Clearly, Lewin's 
algorithm behaves much better. 

The disadvantage with these two fast algorithms lies in 
the fact that the time period between isolating two adjacent 
responderseis, not fixed, but depends onsthe values of @the 
words under consideration. 

In what follows, a new ordered retrieval algorithm is 
presented for an associative memory with BSVI. With this 
algorithm, the first responder is located in time O(n), and 


each of the remaining in O(1). 
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4.2 A new ordered retrieval algorithm 


4.2.1 System overview 
The simplified associative memory organization shown in 
Fig. 4 is used to implement a new ordered retrieval 


algorithm to be presented. 
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Fig. 4. A simplified associative memory. 
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AS can be seen, the register C has been ignored because 
no search key word is needed here. The register M is used to 
mask bit-slices as required. Only those bit-slices for which 
the corresponding M(j)'s are 1's will be included in the 
ordered retrieval process. There is another binary matrix A 
with A(i,j) residing in cell (i,j), which can be visualized 
as interleaved with the binary matrix B column by column. 
B(i,1) precedes A(i,;1),:and A(i,n).'follows B(i,n) for all 
Reb wea iim. 

The s(i) will be denoted throughout this chapter as 
A(i,0) for descriptive purposes only. Initially, matrix A is 
reset, and the words to be retrieved in descending order are 
specified by A(i,0)=1, for some ie{1,...,m}. 

Informally, it is now instructive to view A(1,j), 
A(2,j)meenet Atma) ), TSjsmarads mipathsteachvot enavunitseliong, 
and the ordered retrieval algorithm as a one-way traffic 
system, where each logic 1 initially injected into A(i,0), 
for some ie{1,...,m}, matches in path i toward the extreme 
efghtliendtoneunit*permsteps Injithisiregard;reachacel deni, j) 
PUneetoOnSmasian Inte biivgentetratmuce | 1OntmrOnepatlheimatmed 
position j units away from A(i,0) and is capable of 
permiting or inhibiting the passage of the matching logic 1 
Shrough peyppbasedeon:the intormation carriedeby, bit-slice 4. 
The system is designed in such a way that the order in which 
fhemlogice 1's reach the veryerighteendsisethesdesired 
reading sequence of their associated words. Moreover, after 


n-2 steps, the logic 1's start reaching the very right end 
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in that order at the rate of one arrival every two steps. 

In presenting the algorithm more precisely, the 
following terminology is useful and introduced in the 
context of the simplified associative memory. Let 
S'={B(i)|A(i,j)=1, 1<ism and 0sjsn} be a set of words to be 
read out of the memory in descending order in the course of 
ordered retrieval. For any word B(i)eS', define a function 
PCB (Ci) etoebe: ane integer ce{ 054. #animsuch that 
A(i,P(B(i)))=1. 

Definition 4.1. The effective j-th bit-slice is made up 
Ore thesy) ablimbitwolmevery-worduBilijmtOriwii chen G14) — ane 
Tew) eee ee Tle 

Definition 4.2. The j-th bit-slice is said to be busy 
rf there exists i, ie{1,..4.,m}, such that A(i>3j)=1- 

Definition 4.3. Two words B(l), B(k) are said to be 
adjacent, if P(B(1)) > P(B(s)), and there does not exist 


B(k)eS" such that ™P(B(1)) '<eP(B(k))) < P( Bis) )- 
Example: Consider the memory unit below, where the symbol 
: | fepresents cell (i,j) with a and b for Bi, 4) and 
a 
1) 


A(i,j), respectively: 
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1 1 0 
A(2,0)=0 | ’ | : | ; 
Z| 22 23 
| 0 | 
eo ee 
oi B%4 335) 
0 | 0 
A(4,0)=1 | j | : | j 
41 42 43 
0 0 | 
eee PaamGe Prete 
51 52 5S 
A(6,0)=0 0 0 0 
61 62 CSF 


aya Theseftftective sist, 2nd,wand. 3srdubit-siivce saremnow 
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0 ) p 

) g @ 
; and 


repectively. 


bb): Only the 1st bit-slice, among the three, is busy. 
c) Since P(B(2))=P(B(3))=1, and P(B(4))=P(B(5))=0, both 


words B(2) and B(3) are adjacent to B(4) and B(5). 
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The BSVI required here has the following 
characteristics. First, it is capable of detecting whether 
or not there exists at least one bit in the effective j-th 
bit-slice whose value is 1. Second, it is able to determine 
whether or not the j-th bit-slice is busy. 

Using the "wired-or" technique, such a detection can be 
done in constant time [31]. 

The j-th component of the bit-slice value indicator and 
the associated bit-slice are schematically illustrated in 


Pigey5,) whereehies )=A(15 7-1) B( a9 )5 and: v(5)=Ali 0" 
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It can be readily seen from Fig. 5, that the j-th 
bit-slice is busy if -V(j)=0, and that at least one bit of 
phe effectivegy-thebit<slice isti#iey-Hl 7 iH" 

In refering to the previous example, the following 
equations hold: 7H(1)=1, 7V(1)=0; 7H(2)=0, 7V(2)=1: and 


BHOS)=15 aV(3)=15 


4.2.2 The algorithm 

The heart of the ordered retrieval algorithm is a 
parallel operation called partition which is presented below 
as tAlgoriehme4 ag. 


Algorithm 4.1. Partition. 


Begin 
m 
1 AGERE ASS oey (wired-OR) ; 
m 
2 Ral Bae ie eae IAN (wired-OR); 
3 p Gi; jo SAB 4 -i) 2CB ti doH Gieit 4M C5) oe 
4 TiieViGqi=0 Tandép tig y=ttThen 
Begin 
5 Ail ieee 
6 ACipial i= 07 
End; 
End. 


The essence of Algorithm 4.1 is to simultaneously 


partition each class of words BOD) e@indicatedgby Atty —-1)=1, 
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for Vieil,...,m} and vje{1,...,n}, into two subclasses based 
on the information carried by the effective j-th bit-slice. 
And if the j-th bit-slice is not busy at that moment, then 
the words B(1) in the bigger magnitude subclass can be 
separated from the words B(s) in the smaller magnitude 
subclass by means of setting A(1,j), and resetting A(1,j-1). 
As a result, the words B(1l) and B(s) are now registered by 
A(1,j+1)=1 and A(s,j)=1 into two subclasses respectively, 
which were in the same class before the effective j-th 
bit-slice was examined. 

In refering to the partition algorithm shown, steps 1 
and 2ecomputenvds) bandéH(4)cfortali-jeiape ein} 
Simultaneously. V(j)=1 implies that the j-th bit-slice is 
busy. H(j)=1 indicates that there exists at least one bit in 
the effective j-th bit-slice whose value is 1. Step 3 
assigns each p(i,j) a logic value. It may be noted from the 
statement in line 3 that p(i,j) will have the value 0 if 
either A(i,j-1)=0 or H(j)=1, B(i,j)=0 and the j-th effective 
bit-slice is unmasked. It is apparent that the word B(i) 
with A(i,j-1)=1 and p(i,j)=1 is greater than the word B(k) 
with A(k,j-1) and p(k,j)=0. Moreover, if the j-th bit-slice 
is free and B(i) belongs to the bigger magnitude subclass, 
checked by step 4, then the states of A(i,j-1) and A(i,j) 
are interchanged by steps 5 and 6. This interchange 
effectively separates the words in the bigger magnitude 
Subclass from those in the smaller magnitude subclass. It is 


also possible that there does not exist any p(i,j)=0 
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implying that the smaller magnitude subclass is empty. In 
this case after steps 5 and 6 are executed, the (j-1)-th 
bit-slice will be released. 

When thesj)- th bit-slice is busy »showeven, no separation 
can be expected to be done due to no room currently 
available to register the words in the bigger magnitude 
Subclass.-Therefore all words B8(i) indicated=by AU, 4-1) =21 
have to remain in the original class as if the effective 
j-th bit-slice were not examined. 

An implementation of the partition algorithm in each 
cell is shown in Fig. 6. Each cell (i,j) communicates with 
its left neighbor by means of A(i,j-1) and p(i,j), and with 
HESurrght neighbor by means of A(1,7)) and pt, 31) 4 Gines 
-h(i,j) and -v(i,j) are coupled to the j-th component of the 
BSVI as depicted in Fig. 5. Each cell will receive a global 
timing clock, denoted as P here, to set/reset A(i,j). The 
partition operation can be accomplished in a constant period 
of time, 5 gate delays. 

The ordered retrieval algorithm 1s now presented as 


Algorithm 4.2. 
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“MC 3) ~h(i,j) 
P cell ay (ie) 
al) a A(i,j) 
cal) p(i,j+1) 
(a) 
V+ 


=H(j) =V(j) 
(b) ; 


Fig. 6. Partition logic. 
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Algorithm 4.2. Ordered Retrieval. 


Begin 
1 Ave RQ, t¥iet lf ectiminevg edt ,e.eenks 
2 While V(n)=0 Do Partition; 
é ReadsoutsB(i)indicatedsbysAG) n="; 
4 Partition: 
5 ACipn)+=0> 
6 While V(n-1)=1 Do 
Begin 
“i Partition; 
8 Read out B(i) indicated by A(i,n)=1; 
9 Partition; 
10 A(i,n):=0; 
End; 
End. 


Algorithm 4.2 is achieved essentially by performing a 
sequence of partition operations to isolate all words in S' 
by their contents, and read them out one at time without 
ambiguity. It is assumed that all words in the memory are 
different in order to identify them uniquely. 
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indicates word B(i) in S'. The algorithm terminates when all 


responders have been read out of the memory. In other words, 
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4.2.3 An analysis of the algorithm 

In order to analyze the time complexity, and verify the 
validity of Algorithm 4.2, three lemmas are first 
established concerning the effectiveness of a sequence of 


partition operations on the words in S'. 


Lemma 4.1: The relation P(B(1)) > P(B(s)) is not 
affected by any partition to be executed. 

Proof: Consider the following two cases. 

CASES. Suppose sP(B( 1) 2 =" PUB Us i= aie 

P(B(s)) can be increased by 1 iff steps 5 and 6 are 
executed. However, steps 5 and 6 cannot be executed in this 
case, because V(P(B(1))) equals 1. 

CASE 2. Suppose P(B(1)) - P(B(s)) > 1. 

P(B(s)) can be increased at most by 1 after steps 5 and 
6 are executed. However, this increment by no means changes 


the realtion P(B(1)) > P(B(s)). 


bemmat4.2: Bid) t=tBis) PERE CB(1)erePp(B(s))? 

Proof: There are only two ways in which 
P(B(P) }i> P(B(s)') icanroccurMafter®atpartiticn Phis*<executed. 

CASE 1. Before executing P, P(B(1)) = P(B(s)) which 
implies that B(1,j) = B(s,j) for j=1,2,...,P(B(1)). After P 
is executed, condition P(B(1)) > P(B(s)), implies 
B(1,P(B(1))) = 1 and B(s,P(B(1))) = 0 for the increased 


P(B(1)). Therefore, B(1) > B(s) if P(B(1)) > P(B(s)) 
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follows. 

CASE 2. Before executing P, P(B(1)) > P(B(s)), which 
recursively implies that separating B(1) from B(s) was due 
to -anWearlirernspartitioneP’saintothenswordsmitherelexistsran 
hésuchitnates( Py) =eB ish), Ploelj=1-2Pi Si ehe |= and 
B(1,h)=1, B(s,h)=0; the separation was done by P’ based on 
the above information carried by the effective h-th 
bit-slice. By Lemma 4.1, B(1) > B(s) if P(B(1)) > P(B(s)) 


follows. 


Lemma 4.3: P(B(1)) - P(B(s)) < 2 if B(1) and B(s) are 
adjacent. 

Proof: The lemma can be proven by showing that B(1) and 
B(S).aresnotiadjacentmrf P(B(1)) i= PCB(s))) =eketk>=2) >} aby 
induction. 

Theubasis, ok=3 oeThere must existsarpartition!P such 
that P(B(1)) - P(B(s)) = 2 before P takes place. At 
compketrongortP,iP(BOl))s<iP(B(s)) = 3 implies that _P(B{1)) 
is increased by one but not P(B(s)). However, P(B(s)) 
remains unchanged only when either V(P(B(s))+1) = 1 or 
BispP (Bs) j+1) = Ckduring theeéexecttiontofeeaathesnirst 
condition implies that B(1) and B(s) cannot be adjacent due 
to the existance of B(k) such that P(B(1))>P(B(k))>P(B(s)), 
and that they can never become adjacent, after the execution 
of P, by Lemma 4.1. The second condition implies that there 


exists at least one word B(k') such that P(B(k')) = P(B(s)) 
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and p(k',P(B(s))+1) = 1. Consequently, B(1) and B(s) will be 
no longer adjacent, because P(B(1)) > P(B(k')) > P(B(s)) 
must occur after P is executed. 

Now assume that the inductive hypothesis is true for 
atdevaluestotek2=slaConsidereP(B(])) cuPdBised ehkrithkAgain, 
there must be a partition P’ such that P(B(1)) - P(B(s)) = k 
before it takes place. By the inductive hypothesis, B(1) and 
B(s) cannot be adjacent. By Lemma 4.1, they can never become 


adjacent, even after P’ is executed. 


In what follows, it is assumed that the read operation 
transmitting the contents of a single word B(i) indicated by 
A(i,n)=1 into the register I can be done in constant time 
[28]. Also it is assumed that this read is executed in 
parallel with the partition following it in Algorithm 4.2 
even though it has been written in a serial manner. The time 
complexity of Algorithm 4.2 will be measured in terms of the 


number of partition operations. 


Theorem 4.1: The number of partitions required by 
Algorithm 4.2 to retrieve k responders in descending order 


from the associative memory of m n-bit words is n+2k-1. 


Proof: After step 1 is executed, the words in S' are 
indicated by A(i,0)=1 only. None of the bit-slices 1 through 


n is busy before a partition is invoked. 
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The number of iterations of the first While loop in 
line 2 is n. In other words, to isolate the first responder 
from thejrestsof them requires sm partitionsveThevfact that 
bit>slices: throughs getosnearerall free before the j-th 
iteration takes place, where 1<j<n, gurantees that there 
must exist one word, say B(i), being registered by A(i,j) 
with j increasing each time a partition is executed. At the 
completion of the n partitions, exactly one A(i,n) 
associated with this word is set, which indicates that the 
word can be read out immediately without ambiguity. 

After step 4 is executed, by Lemma 4.3, A(i',n-1), for 
some i'e{1,...,m}, will be set to register words B(i') which 
are adjacent to the word indicated by A(i,n)=1. Step 5 
resets A(i,n) making the n-th bit-slice no longer busy. 

The number of iterations of the next While loop is k-1. 
During the first iteration, after step 7 1s performed; a new 
B(i) will be registered by A(i,n)=1, and be read out 
subsequently. Step 9 guarantees that the words adjacent to 
Bi@n)) SayeB (i), Willsbeyreqistercd=by Ais noi Jeli earver it 
takes effect. Step 10 clears up A(i,n) discarding this B(i) 
from further consideration and, releasing the n-th 
bit-slice. In the succeeding iteration of the loop, the 
words Bli') are registered by A(ijn—-1)=1,).the n-th 
bit-slice is free, and all words in S’ satisfy the property 
specified by Lemma 4.3. The loop will be repeated k-1 times 


to exhaust the remaining k-1 responders. 
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By Lemma 4.2, the k words are retrieved in descending 
order. 

Taking into account the total cost above gives that 
n+2k-1 partitions are necessary. 

The correctness of the algorithm follows by induction 


Ont te. 


O7ESD. 


4.3 Conclusion 

An efficient ordered retrieval algorithm has been 
presented which retrieves k responders in descending order 
from an associative memory of m n-bit words in O(n+k) steps. 
It possesses the salient feature that after the first 
responder is obtained, each of the remaining can be found in 


constant time. 
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CHAPTER 5 
FAST MULTIPLE RESPONSE RESOLUTION ALGORITHMS 


The problem of multiple response resolution is 
discussed in this chapter. Alternative techniques for 
solving this problem are briefly reviewed with the emphasis 
on Anderson's method [25]. A new method is outlined based on 
they fast ordered retrieval algorithm proposed=in’ Chapter :4. 
A comparison is then made between Anderson's method and the 


proposed one. 


5.1 Previous results 

Consider the associative memory organization shown in 
Fig. 1. A search of the associative memory, in general, 
separates the stored m words into two subsets of which one 
contains all the words satisfying the search criteria. The 
words in this subset are also sometimes called responders. 
Deus defined in #Chapter™2, be this subset, ive.7"s(i)= 
indicates that B(i) is a responder. Multiple response 
resolution occurs when the associative memory must read out 
these responders one at a time for the purpose of output. To 
select a responder to read, the associative memory must be 
able to choose from the m-bit vector, register s, an element 
a(t) whichsiselogiewl. In other #words, =re™must sbéwable to 
generate from s an m-bit vector R of which only one element 


is 1 corresponding to the selected responder. Identified by 
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the vector R, the selected word can be read out without 
ambiguity. How fast the vector R can be generated will 
dominate the output speed of associative memories because 
the vector R has to be generated as many times as the number 
of responders. 

Several fast methods for generating the vector R have 
been proposed in the past, and they can be grouped into two 
classes according to whether or not further associative 
searches on the responders are required. The first class, 
including methods due to Frie and Goldberg [5], Seeber and 
Lindquist [6], Lewin [7] and Ramamoorthy [31], uses further 
associative searches to retrieve the responders in an 
ordered sequence according to their contents. The responders 
are supposed to be distinct so that no futher multiple 
response conflict can occur. 

The second class comprising those of Weinstein [10], 
Koo [20], Foster [14], Anderson [26] and Landis [29] 
Uorlisesipmnority Circuitry to retrievesthemeingansordered 
sequence according to their physical location in the memory. 
It is desired in this case to select sequentially all logic 
Jhshineregister s one .ateaitimes. The simplest way,of¢doing 
this, as mentioned by Weistein[10], is to scan serially all 
bits of the register -s.Dfss(i), is not «the ~first™% logic sii, 
a logic 0 will be generated for R(i), inhibiting read out of 
B(i). The scheme requires O(m) gate delays to generate the 
vector R which will defeat the main advantage of a parallel 


search of associative memories when m becomes large. 


= 


peers w tuo is y 


yadmunh sf? 28 fomis -ynem es ss oasananne sd03 ‘end a 108 . 
| 
Me ‘a _arebn a | 
ae 
even A ioimsv 9a3 paAids7 aden. 434 BBodzen gest Late 


1 


i Saal 


ons COIR bequete sd nas ys? ime 986 edi ni Beaogorg 4 Need 


avisetrozas t9As3vF- Jon 3p terijerts of eftibrocos eseeels 


,easin-sezri siiT ost iyGet.e%s: axsbucges oa3° 719 esdo 598 


bag asdsee ,[¢) ptadalod Boa aint <3 au6. 2borttom ontbuloak 


yess? eseu, .ftt)] yds sana id 1 aiwed, , Cah re lupe nd 7 


ai ehnogaey arc abel sabe ote asdotase ieate en 


eyebnoges: SsiT .2scetaod ‘tiSdy OF: gniSshos 3s sonsupoe bezebze- 

algsiium sedgut, on 2883.08 snide tt oa of pence 
PA 

.4U>56.. 166: 25%LIae2 s2noqnes 


(00) abssenteW¥ Jo geod, iabaqnos zasflo bnasse 9D cs Mm 


(ecj pre {as} copa sim ana ope: das) 90! i 


¥ > 


Boxebuo dk ai edt e¥sitses ey Yssivozid f giitolsa: sesith 


Niomsm ods ni ooidsvol Laz Laying ant oe enibresae, ope 

; : bite iio 

strpol- Iie vf seaaygas sbeiee 07; bpBD) 21d! ra) boa %. 2 
AJ / y 1 — i 


eniob ic vaw de sit amis 5 te ‘eno 2 pesbicgiies 
fis yiteizsa nelas hd Ea, Jatadaie xd Danni hem 


ut Peake Piend2 “an ei (pees BY ee : 


58 


An alternative method with time complexity O(log m) was 
first proposed by Weistein using tree structured logic to 
generate the vector R [10]. Weistein's resolver operates in 
two phases: in the first, pulses propagate up the tree, 
setting storage elements on the way, and in the second, 
these storage elements steer a downwards propagating pulse 
Fort heae’ first?) off thes responderiss 

Later, a faster multiple response resolver embodying 
lookahead logic was presented by Foster[14]. Foster's 
resolver uses much less hardware than Weistein's and 
generates the vector R approximately two times as fast. 
Foster's scheme was then generalized by Landis [29] to 
achieve a resolving speed about 20 percent faster. The most 
efficient approach using tree structured logic for 
generating the vector R is due to Anderson [26]. Anderson's 
resolver can speed up Foster's scheme at least by a factor 
of two, and becomes the fastest among the second class thus 
far developed. 

It has been widely believed that Foster's scheme is 
much faster in absolute speed than the ordered retrieval 
methods [29],[31]. The fast ordered retrieval algorithm 
proposed in Chapter 4, however, is very promising for 
resolving multiple responses. Such a resolver is suggested 
in Section 5.3. It is then only neccessary to compare it 
with Anderson's resolver. The following section is therefore 


devoted to describing Anderson's method. 
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9.2 Anderson's Multiple Response Resolver 

Fig. 7 shows Anderson's tree structured resolver. The 
tree is constructed with identical logic blocks of 4 input 
pairs and 1 output pair. The function of each block is to 
perform the resolution operation when enabled to do so by a 
parent block above it, and to inform the parent of any 1 
bits (activity) on its inputs when its parent enables the 
resolution, An indication vector is passed to its descendent 
blocks (or to the vector R), enabling their resolution 
operation. 

Each interconnecting pair is comprised of an activity 
Signal propagating up the tree to the. root block and an 
indication signal propagating down. The pairs at the 
terminal nodes of the tree are the register s and the vector 
R, while the pair at the root are the system level activity 
and enable signals. 

Fig.°8 shows#the circuit Anderson calls the 
"P-generator" block. The A0O - All lines represent inputs of 
activity i remerour lower locations in thestreepefournebitseor 
s or four descendent blocks depending on the level. If any 
of them are 1, then the A-out line going up the tree is 
energized. The P00 - Pil lines represent outputs of the 


indication vector to four lower locations in the tree. 
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Anderson's resolver Structure. 
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Fig. 8. P-generator block. 
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The time for generating R vector is calculated as 
follows. For a memory of m words, Anderson's tree will have 
(log m)/2 levels. The activity signal must propagate upwards 
in the tree through (log m)/2 levels for the root block to 
generate the enable signal. This signal has to propagate 
downwards in the tree through (log m)/2 levels to generate 
the R vector. This gives (log m) unit delays in total. The 
time complexity of Anderson's method is, therefore, 

O(k+-log m). It should be noted that the major advantages of 
Anderson's resolver are its very fast speed and its cellular 


Simaucieumners 


5.3 A new multiple response resolver 

The ordered retrieval algorithm introduced in Chapter 4 
is directly applicable to solving the problem of multiple 
response resolution. A tag field is included for each word, 
where each tag is a distinguishable number of length log m. 
When multiple responses occur, each tag serves as a number 
for the ordered retrieval algorithm. The tags involved in 
the ordered retrieval are those corresponding to the 
response words. Only the bit-slices composing the tag field 
are used in the process. Consequently, this scheme requires 


O(log m +k) steps to resolve k responders. 
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5.4 Comparison with Anderson's resolver 

The distinguishing feature of the multiple response 
resolver outlined in the previous section is that the vector 
R can be generated in constant time for each of the 
responders ‘except the first one. This implies ‘that “itis 
potentially attractive for large associative memories. It 
would be of great value to evaluate the proposed method in 
terms of Anderson's approach which has been widely regarded 
as the fastest one. 

In order to precisely compare these two schemes, it is 
assumed that (1) reading of a selected word into register I 
can be done in constant time [28] and (2) I is made up of D 
type frp RE Fops’. 

For a memory of size m, Anderson's resolver requires 
log m gate delays to generate the vector Re Once) the vector 
R has been generated, one word, say B(i), indicated by 
R(i)=1 can be read into register I. s(i) is then reset, 
discarding B(i) from further consideration. The process is 
repeated until all responders have be exhausted. 

Reading a selected word B(i) and generating another 
vector R for the next word can be overlaped in time. They 
cannot, however, start at the same time, for the following 
reasons. First, s(i) has to be reset to guarantee that the 
vector R to be generated would not locate the same B(i) 
again. Second, resetting s(i) has to be done in such a way 
that the currently used vector R cannot be disturbed before 


register I can receive B(i) properly. It is reasonable to 
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assume that generating the next vector R can be started 4 
gate delays later than the read operation, where 4 is due to 
the sewup time=and Nojd time torethe ) fl 1pEt lopraseuel las 
the reset time for s(i). Further suppose a memory read 
Operation takes time less than (log m +4) gate delays, 
therefore the total time required by Anderson's method to 
resolve k responders is k-(log m +4) gate delays. The total 
time needed by the proposed resolver to do the same task is 
5(log m +2k-1). Consequently,.when.k is. greater than 

5(log m -1)/(log m -6) the proposed resolver takes less time 
than Anderson's scheme does. Table I gives a list of values 
for log m, k' and the ratio k'/m, where 

k'=5(log m -1)/(log m -6). As the table implies, the 
proposed method becomes more efficient than its competitor 
when the memory size is large. For example, when m= 4096, 
the new resolver always takes less time than its competitor 
as long.as.k.isgreater.than.9..For-the-purpose of output, 
however, to read a list of more than.9.out.of the 4096 words 


would be the most likely case. 
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CHAPTER 6 
CONCLUSION 


Three major types of algorithms for parallel 
associative memories have been investigated in this thesis. 
Effecient algorithms for searching, ordered retrieval as 
well as multiple response resolution have been proposed. 
Each of the new algorithms has been evaluated in terms of 
the best algorithm of the same type in the literature. A 
comparison of the results has shown that the proposed 


algorithms are superior in most cases. 
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